This page last changed on Jan 21, 2014 by mbabik.
Initial setup
Host certificate
Install your host certificate to secure the web portal.
$ ls -l /etc/grid-security/host*
-rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem
-r-------- 1 root root 887 Oct 28 19:25 /etc/grid-security/hostkey.pem
$ openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client"
SSL client : Yes
Disable SELinux
SELINUX needs to be disabled to proceed with the installation. If it is enabled, follow the instructions below and reboot the machine.
$ setenforce 0
$ sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
Configuration of YUM Repositories
Scientific Linux v5: SL5
[sl-base]
priority=2
protect=1
[sl-security]
priority=2
protect=1
[sl-fastbugs]
priority=2
protect=1
SAM/EGI repository: SAM repository
[egee-sa1-relx]
name=EGEE Packages from SA1 for CentOS5 corresponding to release 22
baseurl=http://rpm.hellasgrid.gr/mash/centos5-TOM-22/$basearch
enabled=1
gpgcheck=0
protect=1
priority=10
metadata_expire=1
Unified Middleware Distribution v2: UMD version 2
[UMD-2-base]
protect=1
priority=40
[UMD-2-updates]
protect=1
priority=40
EPEL: EPEL repository 5.4
[epel]
enabled=1
priority=50
CA: EGI trust anchors repository
SLC5-cernonly (needed for oracle clients, libraries and tnsnames)
[slc5-cernonly]
name=Scientific Linux CERN 5 (SLC5) CERN-only packages
baseurl=http://linuxsoft.cern.ch/onlycern/slc5X/$basearch/yum/cernonly/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-cern
file:///etc/pki/rpm-gpg/RPM-GPG-KEY-jpolok
gpgcheck=1
enabled=0
protect=1
Installation
- Install required packages
$ yum -y install yum-priorities yum-protectbase python-setuptools
# Oracle libraries and clients - might be available from other sources; note this installation assumes Oracle connections via tnsnames
$ yum --enablerepo=slc5-cernonly -y install oracle-instantclient-basic oracle-instantclient-sqlplus perl-DBD-Oracle cx_Oracle
# python-meld3
$ wget http://koji.afroditi.hellasgrid.gr/packages/python-meld3/0.6.7/1.el5/x86_64/python-meld3-0.6.7-1.el5.x86_64.rpm
$ yum localinstall python-meld3-0.6.7-1.el5.x86_64.rpm
$ yum install lcg-CA # ca-policy-egi-core might work as well
- Install SAM-Gridmon
yum install sam-gridmon --exclude sam-nagios
yum -y update --exclude sam-nagios
- Make sure sqlplus works. You may need to add the Oracle home to your library path.
echo /usr/lib64/oracle/10.x.x.x/client/lib64 >> /etc/ld.so.conf.d/oracle-instantclient.conf
ldconfig
- Check the Release Notes for any additional instructions
Configuration
- Default configuration
The configuration specification is available in the SAM documentation. To properly configure your instance, please go through:
- Common configuration options
- SAM-Gridmon specific options
- Additional configuration
SAM releases often provide additional configuration on each release.
- Please check the Release Notes of the latest production release.
- Check the FAQs for common configurations and problems.
- Consult bootstrapping steps if this is an installation from scratch
- Database deployment
| In order to perform the database deployment, the following variables are needed:
DB_USER, DB_USER_R, DB_USER_W, DB_PASS, DB_PASS_W, DB_PASS_R and DB_NAME
DB_USER is the main account; synonyms are created during the DB deployment to DB_USER_W and DB_USER_R.
Setup of corresponding read/write permissions needs to be done by DBA in advance. |
The database deployment is not performed automatically as part of yaim. The following yaim function should be executed manually:
/opt/glite/yaim/bin/yaim -r -s /etc/lcg-quattor-site-info.def -n sam_gridmon -f config_database
- Run yaim
$ /opt/glite/yaim/bin/yaim -s /etc/yaim/site-info.def -n SAM_GRIDMON
- Consult bootstrapping steps if this is an installation from scratch
Optional: latest metric per VO alarm
If you want to enable alarm for checking the arrival of metrics per VO, download the create_vo_alarm.sql script and run it on the master Oracle account.
Bootstrapping
- Consult a sample site-info.def used at CERN.
- Download SAM Oracle DB administration scripts
- 3 Oracle accounts are needed, owner, reader and writer - grant_priviledges.sql script needs to be run on the owner account
- Download http://gridops.cern.ch/config/nagios-roles.conf and store it on the box, so it can be downloaded via N2MS_ROLES_URL link define in site-info.def
- OSG MRS bootstrapping is needed in case you'd like to import monitoring results from RSV2SAM bridge (to be executed after config_database):
$ sqlplus /usr/share/doc/mrs-1.7.43/DBScripts/oracle_bootstrap/OSG_Bootstrapper.sql
- Initial yaim will fail during configuration of POEM since no profiles are defined by default, but it needs to be executed to do initial setup of ATP and POEM. In order to continue with the configuration you can run the following script to import profiles from other SAM instances (or create initial profiles from scratch manually at http://<sam_gridmon/poem/admin):
$ export DJANGO_SETTINGS_MODULE=Poem.settings
$ django-admin createsuperuser (this will setup root access to http://<sam_gridmon/poem/admin needed to manage roles and profiles)
$ django-admin import_profiles --url http://grid-monitoring.cern.ch/poem/api/0.2/json/profiles/ ROC ROC_CRITICAL ROC_OPERATORS GLEXEC CLOUD-MON
$ /opt/glite/yaim/bin/yaim -s /etc/yaim/site-info.def -n SAM_GRIDMON
- After successful yaim consider removing the following consumers:
$ rm -f /etc/msg-consume2db/msg-consume2db-1.conf /etc/msg-consume2db/msg-consume2db-2.conf # OSG consumers
$ rm -f /etc/msg-consume2db/msg-consume2db-3.conf # consumers from VO Nagioses
$ service supervisor reload # make sure you have only msg-consume2db processes that you configured
- After successful yaim enable MRS job
Using SAM Oracle DB administration scripts
$ sqlplus <db_account/db_pass@db_service> @enable_jobs.sql
Validation
- Check the messages are consumed (succesfull inserts should be reported at):
less +F /var/log/supervisor/msg-consume2db-0.log
- Check that Oracle job is running (either via Oracle web interface or indirectly via query):
# should be decreasing if messages are consumed (job runs every 3 minutes)
select count from metricdata_spool
- Check the web interface is up (note it will take at least 30 minutes before something can be seen in the web interface)
- https://<hostname>/mywlcg or
- https://<hostname>/myegi
- Manually check status and availability is computed
# there should be entries with status other than missing
select * from statuschange_metric_service
# availability
$ less +F /var/log/ace/ace_status.log (ace_availability.log)
|